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1. INTRODUCTION 

Nowadays the problem of temperature management of multi-core processors has become a critical 
problem for developers due to the increase in computing power and integration density. The power 
consumption generates heat that increase the chips temperature. High temperature decreases the reliability 
and life of the chips. Therefore, providing a solution for proper temperature and performance management in 
the multi-core processor is inevitable [1], [2]. 

To control the temperature of chips, embedded systems usually use dynamic thermal management 
(DTM) techniques. These techniques usually try to adjust the processor voltage and frequency to control 
power consumption and temperature [3]-[5]. Many thermal management techniques have been proposed for 
modern chips due to increased power densities and reliability implications. When the junction or skin 
temperature outpaces a safe value point, the cores need to reduce the temperature by lowering the power 
consumption. Therefore, the lack of a dynamic thermal-power management (DTPM) algorithm can lead to 
reduced performance [6], [7]. For this reason, a trade-off is made between maintaining performance and 
controlling the temperature [8]. 

Thermal management techniques can be divided into two categories: the physical techniques and the 
techniques based on control theory. The initial simple physical technique solution to deal with temperature 
rises in multi-core processors was to add a heatsink to further disperse the heat generated in the processor. 
Sahoo et al. [9] reported different arrangements of heat sinks and their construction elements. Choi et al. [10] 
investigated the performance of active central processing unit (CPU) cooling heatsink with heat pipes. 
Siricharoenpanich ef al. [11] studied the impact of the inclination angles of the heat pipes of CPU. Research 
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by Yousefi ef al. [12] an experimental study of heat transfer performance of a CPU cooling heat pipe and 
examining the effects of inclination angle and nanofluids has been presented. Subsequent technologies 
included the addition of water cooling and liquid cooling. Nazari et al. [13] compared the cooling 
performance of the common base fluids. One of the most basic forms of DTM is known as “stop-go” [6]. 
Although it’s very effective in controlling the temperature but it reduces efficiency dramatically. In the 
second category, the control methods extracted from control theory such as proportional—integral—derivative 
(PID) controller [14], model predictive control (MPC) [15], stochastic control [16], nonlinear control [17], 
fuzzy control [18]—[20] are used to control the chip temperature. According to [21]—[23], frequency and fan 
speed controllers have been designed but control goals including temperature and performance control have 
not been examined simultaneously. According to Fu et al. [24] achieved control goals but the proposed 
thermal model is very difficult to extract and the parameters required to extract the thermal model cannot be 
achieved in the most of processors. 

In the most of the mentioned control techniques, the core frequency and the fan speed in the design 
of controllers were not considered simultaneously or there is a lack in tradeoff between performance and 
temperature control. Therefore, in our proposed solution to solve the mentioned problems, we consider the 
core frequency and the fan speed as the control variables in order to make a good compromise between 
performance and temperature control. We used the optimal control method. The reason for selection is that 
the control variables have constraints and through this method we can define a tradeoff between temperature 
and performance by considering the constraints. 

The rest of this paper is organized as follows: an overview of the proposed DTPM technique is 
provided in section 2. Mathematical statements are described in section 3. Experimental evaluation on the 
Exynos 5422 processor employed in the ODROID-XU4 board is presented in section 4. Finally, the 
conclusions are discussed in section 5. 


2. PROPOSED FRAMEWORK 

Effective management of performance and temperature in multi-processor systems depends 
critically on accurate analytical models that can be evaluated at run-time [25]. We have used the ODROID- 
XU4 board with Exynos 5422 processor, which has 4 cores belonging to a big cluster and 4 cores belonging 
to a little cluster. In this board, the frequency of large and small cores, the temperature of large cores, the 
CPU utilization of large and small cores and the fan speed can be accessed and observed. The temperature 
sensors are located only on the cores of the big cluster and temperature samples are the average temperature 
of cores (the accuracy of the temperature sensor of each core is 1 °C). Also, it should be noted that since this 
board is an asymmetric multi-core processor, by changing the frequency of one core in each cluster, the 
frequency of all cores in that cluster changes. 

It is not possible to directly measure power in ODROID-XU4; therefore, we can calculate the power 
consumption of the cores indirectly. In the others word, we try to have an appropriate estimation of power 
consumption. The main heat generating elements in the ODROID-XU4 are A7 and A15 clusters (little and 
big clusters). The focus of our work is on these two clusters and thermal model for them. 

We have used the framework in Figure |. At first, the measurable data of the A7 and A15 cores 
including the working frequency of the cores, the temperature of the cores, and the estimation of the computing 
power in the execution of the benchmark programs are collected. These data are used in providing power and 
thermal models. In the next step, we use this model to obtain the state space and optimal controller design. 


Data accessible 
Power model 


Protai(W) = Py + uP, + uP, Thermal model 


Protat = Paynamic + Pstatic T[k +1] = AT[k] + BP[k] 


_ 2 
Prova = acv f + Prratic 


Pstatic = constant 


Standard model 
T[k +1] = AT[k] + Bulk] 


P[k] = fu) 


u: control variable 


Design controllers 


Figure |. Proposed framework 
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3. MODEL GENERATION 

Effective power and temperature management depends critically on accurate analytical models that 
can be evaluated at runtime. To achieve this goal, the model generation is divided into two parts: the power 
model and the thermal model. In the following, two models are described. 


3.1. Power model 

Power consumption in the integrated circuits is the sum of two contributions: dynamic and static. 
The dynamic power can be expressed through (2) where a an c are the activity factor and the switching 
capacitance [26]: 


Prorat = Paynamic + Pstatic (1) 
Protat = acv’ aaf + Prtatic (2) 


The total power is calculated indirectly using CPU utilization. CPU utilization is a simple tool for 
estimating the amount of computation that a CPU perform per time. According to Walker et al. [27] this 
method is used to calculate the total power for the ODROID-XU3 board. The ODROID-XU3 board has the 
same hardware specifications as ODROID-XU and the only difference between the two boards is that XU3 
board has a voltage sensor. In this reference, the following is used to calculate power: 


Protat = Py + uP, + uP, (3) 


where u represents CPU utilization and P; coefficients for calculating total power are in accordance with the 
following table. Table 1 does not provide data for some operating frequencies such as 200 MHz and 
1,700 MHz. For this purpose, using MATLAB curve fitting software, we obtain a relationship with a good 
accuracy for each coefficient. The accuracy of each relation and the of each coefficient are given in the Table 2. 


Table 1. Values of P, coefficients at different frequencies [27] 


Freq (MHz) Po P, P, 
250 0.010162804 0.001572937 -1.6E-06 
300 0.009521533  0.001955864 -2.8E-06 
350 0.010338362 0.002260855 -3.27E-06 
400 0.012404617 0.002786165 -4.16E-06 
450 0.014982476 0.003470516 -5.15E-06 
500 0.019350116 0.004348964 -6.59E-06 
550 0.023879147 0.005364183 -8.19E-06 
600 0.029694786 0.006573153 -1.04E-05 
800 0.036437558 0.015975673 -2.45E-05 
900 0.043569437 0.018618245 -2.36E-05 

1,000 0.05151296 0.022298895 -2.31E-05 
1,100 0.06267777 —0.026027083 -1.69E-05 


1,200 0.071779829 — 0.029651724 -9.56E-06 
1,300 0.082334714 — 0.03450953 8.47E-06 
1,400 0.092116725 0.046563344 -9.22E-05 
1,500 0.100661815 0.060414109 = -0.000231806 
1,600 0113207437 _0.071046296 _-0.000356278 


Table 2. P; coffecient in terms of ferquency 


Coefficient Equations R? 
Po 2.939e — 8f? + 2.384e —05f + 0.0001241 0.9971 
P, (7.963e — 11) f278 + 0.002045 0.9879 
—9.871e — 05 + 0.0001559 cos (0.002327f) + 0.000173sin( 0.002327 f) 
P, + 0,0001407 cos ( 0.002327f) — 0.0001495sin( 0.002327) — 6.48e 0.9803 


— 05cos( 0.002327f) — 5.355e — 05 sin(0.002327f) 
R’ is a measure of the goodness of model fitting 


To obtain the values of the core voltages using the table given in [28] and using the MATLAB curve 
fitting toolbox, the voltage relation in terms of frequency is obtained as follows (The R? fitness coefficients 
are above 0.99): 


Va7 = 0.0000001963f2 + 0.000004489f, + 0.8886 (4) 


Vais = 0.0000001487 2, — 0.0001167f,, + 0.9319 (5) 
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3.2. Thermal model 
The thermal model can presented as (6) [29]: 


T[k + 1] = AT[k] + BP[k] (6) 


T[k] and T[k + 1] are the observable temperature and the temperature in next time respectively. In 
order to obtain the coefficients of the, the power consumption and temperature are sampled in different 
workload. For this purpose, different benchmarks are used. The main use of these benchmarks is to affect the 
power consumption and performance of cores, including Mi-bench, Lm-bench, media-bench, and even 
scripts written by the user. For this purpose, 14 different tasks selected from the above sets have been used. 

During the evaluation, we first set the temperature to 40 degrees and perform sampling. Also, to 
prevent damage to the board, we have selected the temperature threshold of 80 degrees. After passing this 
temperature, the fan performs cooling operation with maximum power. 

It should be noted that for both clusters to be well included in the model, first we put one cluster at 
its minimum frequency and change the frequency of the other cluster by applying different workloads at an 
interval of 200 MHz from the lowest frequency to the highest frequency. In each frequency interval the 
temperature and estimated power are calculated. This is repeated for the other cluster and the sampling is 
done for 0%, 50%, and 100% of fan speed. After sampling the data, we obtain the unknown coefficients 
using the following relationships: 


Y = x0 (7) 
where in: 
Xmx4 -_ [T[k] P,[k] P,s[k] Pranl, Oa x1 = [A B; Bis Bran|', Yet = T[k + 1] (8) 


A is the current temperature coefficient, B, is the little cluster power consumption coefficient, By, is 
the big cluster power consumption coefficient, P is the power consumption of each element and m is the 
number of sampling data. Given that only the big cluster temperature is visible, T[k] is considered as the big 
cluster temperature and therefore no index is considered for it. To calculate the unknown coefficients, we use 
the recursive method: 


G'Oe=xXv (9) 
@ = (X7X)"1XY (10) 


Given the sampling time At, the can be converted to a continuous [30]: 


Tik + 1] = (1+ aAt)T[k] + (ByAt)f,[k] + (BysAt) fis lk] + (BranAt) upanlk] (11) 
Tier AI-TIK) — GP Lk] + By fylk] + Bisfislk] + BranYyanlkl (12) 
T(t) = aT (t) + By f(t) + Bisfis(t) + BranUgan(t) (13) 


Finally, the continuous-time relationship can be express as (14): 
T(t) = 0.00176397T(t) + 0.07258447P, + 0.03656092P,; — 0.00095002ufan (14) 


To access the standard form of describing state space, the P, and P,, must be expressed in terms of 
frequency. Based on (2) and by simplifying, the final form of the is as (15): 


2 
aC7V7 fy + Poy 


Tik +1] =AT[k] + (By B ( 
ee ACisVisfis + Psis 


) Be ita (15) 
The linearization of the dynamic power terms in the P7 and P15 is done as shown in Table 3. 


Table 3. Linear approximation of the product of dynamic power 


Coefficient vi fy v2 fis 
The linear equation 1.708f,-—385.2 1.648f,, — 499.9 
R? 0.9547 0.9372 
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The final relationship is obtained as (16): 


T(t) = 0.00176397T (t) + [0.12397428ac, fy + (— 27.9595ac, + 0.0000047426)] 
+[0.06025242ac,s fis + (— 18.2768ac,; — 0.001934072)] (16) 
— 0.00095002fanspreea 


3.3. Optimal controllers design 

The purpose of designing optimal control is to achieve a control signal u(t) in the time interval 
te|ty t,| . So that after applying the signal, the system observes desired performance by considering the 
physical constraints. To formulate optimal control problems, the following three principles must be 
examined: i) mathematical description of the system which is in the form of state-space; ii) expressing the 
physical limitations of the system; and iii) determining the performance index of the problem. 


3.3.1. State-space description 
The description of the state space is in the form of (16). It can be rewritten as (17): 


X(t) = AX(t) + [By(t)uy(t) + w7(t)] + [Bis (t)urs(¢) + Wis (6)] + BranUpan(t) (17) 


The coefficients w is obtained from the sum of the approximations of the of dynamic power with 
leakage power (the leakage power relationship is given in [31]) and can vary for each workload. The 
coefficients B;(t) and B,;(t) are considered as variables due to the activity coefficient of each cluster. 


3.3.2. Physical constraints of the system 
After obtaining the mathematical model, the physical constraints on the state variables and the 
control variables must be defined. These constraints are as: 


200 MHz < f, < 1400 MHz 

200 MHz < fis < 2000 MHz 

0S Ufan S 255 

On the ODROID_XU4 board, the default control solution for fan speed control is shown in Table 4. 
The values [0 255] are for the 8-bit register of the PWM fan controller. 


Table 4. Default fan speed control 
Trip point 0 1 2 
Temperature 45 50 55 
Fan speed 150 190 252 


3.3.3. Performance index 

To quantitatively evaluate the performance of the system, a performance index must be selected. 
The performance index is maximized or minimized in the optimal system. Given that our goal is to maintain 
system performance and temperature control, we select the performance index as (18): 


J(u) = Sia + vrur(t? + vistas(t)? + Yranttpan(t)?]dt = fF g(t) dt (18) 


where J is the performance index, / is the constant coefficient, y is the coefficient for each control variable 
and u is the control variable. The coefficients of function are determined according to the required interval of 
the controllers to force the working load as (19): 


{p= f.[o.5 + 0.000005 f47(t)? + 0.0000025 fy15(t)? + 0.00025u¢an (t)?|dt (19) 


In solving our problem, the above function must be minimized. In this case, this function is known 
as the cost function. 


3.4. Optimal control theory 
To solve the optimal control problem, we must control u* € u which causes the system: 


x(t) = Ax(t) + Bu(t) + w(t) = a(x(t), u(t), t) (20) 
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Finds an acceptable path and minimized the performance index. In terms of Hamilton, we have: 


H(x(t), u(t), ) * g(x(t), uo), t) + p’ (O[ax(e), u(o), 0)] (21) 


P are known as Co-state. Necessary conditions for u* is optimal control are: 


HD) =F eO,u' OPO, t) (22) 
BO =-L£eOuvOrO.9 (23) 
H(x*(t),u*(t), p*(0),t) < H(x*(0), u(t), p’, 0) (24) 


Lemma, if the: 

dH, . Fi 

5 OX (4), (©), p* (0), t) = 0 (25) 
Be established and the matrix: 

07H * * * 

oeOwO,r'O.0 (26) 


Is also positive, w*(t) is a sufficient condition for the controller design. Finally, using the mentioned 
relations, the control law for frequency controllers is obtained as (27): 


2 
—(Ax+C)+ |(ax+c)2448° 
y ;x(t,) < x(t) < x(t) 


B 


u,(t) = (27) 
Umin ;xX(t2) < x(0) 
Umax ;x(t) < x(t,) 
that: 
(A-umax?¥—Umax-%)B 
x(ty) =" (28) 
(A-umin2¥—Umin =) B 
C7 aie ad 
and for fan speed controller: 
+255 ;x(t,) < x(t) 
; = (Ax) + tae 
od + 30 < x(t) < x(t) B®) 
0 ;x(t) <0 
that 
x(t) =— (A-65025y)B (1) 


510yA 


4. EXPERIMENTAL EVALUATION 

In this section, the performance of the proposed control method in the active and inactive states of 
the fan Odroid XU4 board and the execution time compared to the default state have been examined. The 
Odroid XU4 board uses the Samsung Exynos 5422 system-on-chip that integrates four Cortex-A15 (big) 
cores, and four Cortex-A7 (little) cores. The proposed methods are examined on common benchmarks, such 
as MP3, MP4 files, and Sysbench. 


4.1. Experiment 1 
In this section, a motion picture experts group audio layer 3 (MP3) file of music with the executable 


time 3':45" is played and temperature variations are recorded for different 2 as shown in Figure 2. Since this 
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test is a moderate workload for the processor, the fan is deactivated. As can be seen in Figures 3 and 4, as the 
A coefficient in the performance index is increased, Clusters operate at higher frequencies, which increases 
the temperature. Also note that by increasing temperature, the purposed method decreases frequencies of the 
cores, which is the capability of the temperature reduction depends on the coefficient of the 2. 


Temperature variations with different 
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Figure 2. Temperature varitions for different 2 
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Figure 3. Behavior of A7 frequency for different 2 Figure 4. Behavior of A15 frequency for different 2 


4.2. Experiment 2 

In the second experiment, a MP4 file with the executable time 60 seconds is played. To evaluate the 
performance of control signals, this experiment is done in two states: without the fan and with the fan. In 
Figure 5 the fan is deactivated, the priority of purposed method is to reduce the temperature. It should be 
noted that since the workload and the controller are run simultaneously, the overhead causes the temperature 
to rise rapidly. But in the following, the slope of the temperature curve indicates that the proposed method 
reduces the temperature slope relative to the default state. Figure 6 and Figure 7 show the variation of A7 and 
A15 core frequency respectively in term of A when the fan is not active. 

The goal of the proposed method when the fan is activated is performance optimization with 
temperature constraints as shown in Figure 8. Also, it can be seen that the initial overhead in this curve is 
higher than the Figure 6. This overhead is expectable because the fan speed adjustment commands have been 
added to the previous control commands. 

In Figure 8 although at A=7 the temperature is 3.5 degrees higher than the default, according to 
Figure 9, it is clear that the fan is working at 20% less speed. By changing the coefficient correctly, this 
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temperature difference can be compensated, but the fan speed power will be less than before. Increasing fan 
lifetime is one of the benefits of reducing fan speed, as developers and programmers make a trade-off 
between fan cooling power and temperature reduction. Figure 10 and Figure 11 show the variation of A7 and 


A15 core frequency respectively in term of 2. 
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Figure 5. Temperature varitions for different 2 
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Figure 9. Behavior of fan speed for different 2 
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Figure 10. Behavior of A7 frequency for different A Figure 11. Behavior of A15 frequency for different 2 


It should be noted that the core frequency contains only the values 200, 300, ..., 2000 MHz. It is 
notable that the software of this board applies the frequency just in predefined values. For this reason, in the 
Figures 3, 4, 6, 7, 10, and 11 we observe just the predefined value of frequency. The commands for changing 
the frequency of core | in the Linux shell are shown in Figure 12. The frequency is in units of hertz (Hz). 


e root@odroid: /home/odroid/Desktop 

Edit View arch Terminal Help 

@odroid:, = /Desktop# cpuf : 

@odroid: ome /odroid/Desktop# jevi ystem/cpu/cpu0/cpufreq/cpuin 
cur_freq 


oid/Desktop# 


roid: /home/odroid/Des ys/ vices/sys /cpu/cpu0/cpufreq/cpuin 
freq 


oid/Desktop# ca evices/syste )/cpufreq/cpuin 


@odroid: /home/odroid/Desktop# il 


Figure 12. Commands written in the Linux command line 


4.3. Experiment 3 

The efficiency of the proposed control in reducing the temperature was evaluated in two previous 
experiments. In this section, the performance of the board is measured by the runtime. The Sysbench 
benchmark was used to record the runtime. As the A coefficient is increased, the runtime of the benchmark 
reduces. According to the Table 5, runtime for A=0.5 is degraded by 2x and for A=12 is upgrade by 1.0375x 
when compared to default controller. 


Table 5. Impact of A changes at runtime 
X 0.5 2 5 7 10 12 default 
Runtime(s) 30.9399 22.9648 17.5474 16.2315 15.2118 14.9750 _ 15.5376 


Through the study in this paper, we can ensure that the control strategy can be expanded to trade-off 
between the energy savings and temperature control in multi processors systems. It should be noted that 
higher performance requires higher power consumption. As a result, higher performance can be interpreted as 
higher temperature and energy consumption. 
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5. CONCLUSION 

With the ever increasing computational demand, the multi-processors systems undergo tremendous 
thermal stress that has a detrimental effect on the system reliability. Many of the proposed DTM techniques 
do not consider the frequency and the fan speed as control variables simultaneously and were limited to one. 
Consequently, these techniques can be considered incomplete. In this paper, we solve the above problem with 
the optimal control methods. Also, by obtaining an accurate thermal model, control goals, including 
temperature and performance control are examined by changing the / coefficient in the cost function. The 
choice of this coefficient is very important in the behavior of controllers. The experiments were performed to 
demonstrate the effectiveness of the proposed method for various purposes as well. In the first and second 
experiments, it is shown that the algorithm proposed can reduce the temperature and also the fan speed power 
by about 20%. In the final experiment, the results manifest the effectiveness of the proposed method in the 
performance. In the future research work, we want to use a multiprocessor system with temperatures sensors 
for all of cores that examine the interaction effect and improve the efficiency of the proposed control method. 
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